Systems and methods for using video metadata to associate advertisements therewith

ABSTRACT

A system for using metadata from a video signal to associate advertisements therewith, comprising (i) a segmentation system to divide the video signal into video clips, (ii) a digitizing system for digitizing the video clips, (iii) a feature extraction system for extracting audio and video features from each video clip, associating each audio feature with respective video clips, associating each video feature with respective video clips, and saving the audio and video features into an associated metadata file, (iv) a web interface to the feature extraction system for receiving the video clips, and (v) a database, wherein video signals and associated metadata files are stored and indexed, wherein the associated metadata file is provided when a video player requests the corresponding video signal, enabling selection of a relevant advertisement for presentment in conjunction with respective video clips based on the associated audio and video features of the respective video clip.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 12/206,622, filed Sep. 8, 2008, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application Ser. No. 60/970,593, entitled “Systems and Methods for Using Video Metadata to Associate Advertisements Therewith,” filed Sep. 7, 2007. The entire contents of the above mentioned applications are hereby incorporated by reference for all purposes as if fully set forth herein. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application.

TECHNICAL FIELD

The present invention relates generally to targeted advertisements and, more particularly, to methods and systems for delivering targeted advertisements in association with a video program based on metadata associated with the video program.

BACKGROUND

An advertisement promotes the goods, services, organizations, ideas, etc. of an organization or company via a media. Traditional advertisements were made on printed materials and were available on pamphlets, flyers, billboards, posters, newspapers, and magazines. As electronic technology developed, commercials were incorporated into multimedia content, such as radio, television, and movies and were typically presented as an interruption of the primary content—occurring either before the primary content or at intervals during the primary content. Today, advertisements are placed within television programs and movies through product placements and are available on the Internet and on electronically stored content (e.g., DVDs), such as in commercials, trailers, and in promotions on DVDs.

Traditional advertisements have typically targeted general audiences. Such advertisements can be tailored somewhat to the audience likely to be watching a movie, television program or show or event, or radio station or program based on the general content of the program or show and based on the likely demographic of the audience who would be expected to watch such program or show. The Internet provides advertisers with a more specific targeted audience and, hence, higher potential return on their advertisement expenses. For example, because each computer contains potentially trackable and usable information about user(s) of that computer (e.g., through the use of cookies, location information, language settings, and prior web sites accessed), Internet websites are able to use such information to generate banner or pop-up advertisements that are based on some information available about potential users of each computer. In yet another example, Internet search engine sites are able to “sell” the terms or keywords used by an Internet searcher to present targeted advertisements that have been associated with specific keywords or search terms. Such advertisements are presented in pop-up windows, banner advertisement windows, or as “sponsored” links to websites that have requested and paid for prominent placements on the search results screen for specific keywords or search terms. An Internet user that searches “keywords” is more likely than a member of the general public to be a potential customer of a good or services associated with such keywords.

With the continuing advance of technology, bandwidth, and availability of broadband access, online video viewing is becoming increasingly popular and promises to become even more prevalent with the continuing expansion and use of IPTV and video on demand. Unlike static or substantially-static content (text, photographs) that is typically available on a webpage, that gets updated only periodically (more frequently for a news webpage and much less frequently for a standard company webpage), and that sustains a particular viewer for only a brief amount of time, commercial videos over the Internet provide an opportunity to capture a viewing audience for a substantially longer amount of time. However, audiences that are used to watching movies and television on DVDs or off of a DVR are unwilling to view conventional advertisements that interrupt the flow of the video stream.

For these and many other reasons, there is a need for a technology platform that is able to provide and display advertisements that are targeted to the specific audience and that are tied to specific programming being viewed. There is a need for methods and systems that enable such advertisements to be viewed selectively and simultaneously with the primary content in such a way that does not interfere with the primary content. There are yet further needs for methods and systems that provide real-time advertisements for the viewer regardless of whether the viewer is accessing the content from off of the Internet or from a DVD or similar electronic media storage if the display device has access to the Internet.

Therefore, it is apparent that a heretofore unaddressed need exists in the art to address the aforementioned deficiencies and inadequacies.

SUMMARY

The present invention, in one aspect, relates to a method for using metadata from a video signal to associate advertisements therewith. In one embodiment, the method includes (i) segmenting the video signal into a plurality of video clips, (ii) extracting audio and video features from a video signal, (iii) digitizing the plurality of video clips, (iv) identifying extracted audio features within respective digitized video clips using audio processing, wherein each audio feature is associated with the respective digitized video clip, (v) identifying extracted video features within respective digitized video clips using visual processing, wherein each video feature is associated with the respective digitized video clip, (vi) saving the associated audio features and associated video features in a metadata file, (vii) associating the metadata file with the video signal, (viii) storing the metadata file in a database, and (ix) providing the associated metadata file when a video player requests the corresponding video signal. The associated metadata file enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip.

The video features includes at least one of (i) one or more people, (ii) one or more characters, (iii) one or more animals, (iv) one or more objects, (v) one or more geographic locations, (vi) background, (vii) one or more scene, or a combination of these features. In one embodiment, these video features are extracted by a visual processing system of the feature extraction system. In another embodiment, the method includes the step of identifying and recognizing one or more objects from the video signal by an object classification system of the feature extraction system. In yet another embodiment, the method includes the step of identifying and recognizing one or more scenes from the video signal by a scene classification system of the feature extraction system. In yet another embodiment, the method includes a combination of both steps.

In one embodiment, the video signal may contain accompanying audio signal. Audio features of the audio signal includes at least one of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and a combination of these audio features. In another embodiment, the method further includes the steps of: (i) identifying and recognizing one or more background sounds from the audio signal by using a sound classification system of the feature extraction system, (ii) identify and recognizing one or more music segments from the audio signal by using a music classification system of the feature extraction system, and (iii) identifying and recognizing human speech, dialogues, one or more words, one or more phrases by using a speech recognition system of the feature extraction system. In yet another embodiment, the method further includes the steps of: (i) collecting audio features of the audio signal by using audio signal recognition system of the feature extraction system, and (ii) saving the collected audio features in the metadata file.

In one embodiment, the metadata file is an XML file. The metadata file contains one or more of (i) video identification information, (ii) a file name, (iii) a digital signature, (iv) the length of the video signal, (v) a keyword list, (vi) a time-coded transcript, (vii) one or more segments with a corresponding start and stop time, (viii) one or more contents, (ix) one or more characters, (x) one or more animals, (xi) one or more objects, and (xii) a list of vocabulary.

In another aspect, the present invention relates to a system for using metadata from a video signal to associate advertisements therewith. In one embodiment, the system has (i) a segmentation system for dividing the video signal into a plurality of video clips, (ii) a digitizing system for digitizing the plurality of video clips, (iii) a feature extraction system for extracting audio features and video features from each digitized video clip, associating each audio feature with at least one digitized video clip, associating each video feature with at least one digitized video clip, and saving the audio features and video features into a metadata file associated with the video signal, (iv) a web interface to the feature extraction system for receiving the digitized video clips, and (v) a database accessible by a third party user, wherein video signals and associated metadata files are stored and indexed with a unique filename for each video signal in the database and its corresponding video signal. The associated metadata file is provided when a video player requests the corresponding video signal, and enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip.

In one embodiment, the video features comprise at least one of (i) one or more people, (ii) one or more characters, (iii) one or more animals, (iv) one or more objects, (v) one or more geographic locations, (vi) background, (vii) one or more scenes, and (viii) any combination thereof. In another embodiment, the video signal includes an accompanying audio signal.

In another embodiment, the audio features of the audio signal comprise one or more of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and (vi) any combination thereof. In one feature, the feature extraction system further comprises an audio signal recognition (ASR) system to identify and recognize the audio features of the video signal, and a visual processing system to identify and recognize the visual features of the video signal. In another feature, the visual processing system further comprises a object classification system to identify and recognize one or more objects from the video signal, and a scene classification system to identify and recognize one or more scenes from the video signal. In yet a further feature, the audio signal recognition system further comprises a sound classification system to identify and recognize one or more background sounds from the audio signal, and a music classification system to identify and recognize one or more music segments from the audio signal, and a speech recognition system to identify and recognize human speech, dialogues, one or more words, one or more phrases. In another feature, the metadata file comprises one or more of video identification information, a file name, a digital signature, the length of the video signal, a keyword list, a time-coded transcript, one or more segments with a corresponding start and stop time, one or more contents, one or more characters, one or more pets, one or more objects, and a list of vocabulary.

These and other aspects of the present invention will become apparent from the following description of the preferred embodiment taken in conjunction with the following drawings, although variations and modifications therein may be affected without departing from the spirit and scope of the novel concepts of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate one or more embodiments of the invention and, together with the written description, serve to explain the principles of the invention. Wherever possible, the same reference numbers are used throughout the drawings to refer to the same or like elements of an embodiment, and wherein:

FIG. 1A illustrates a first embodiment of an advertisement placement system of the present invention;

FIG. 1B illustrates a second embodiment of an advertisement placement system of the present invention;

FIG. 1C illustrates a third embodiment of an advertisement placement system of the present invention;

FIG. 1D illustrates a fourth embodiment of an advertisement placement system of the present invention;

FIG. 1E illustrates a fifth embodiment of an advertisement placement system of the present invention;

FIG. 1F illustrates a sixth embodiment of an advertisement placement system of the present invention;

FIG. 2 illustrates one representative display screen for viewing a video program and advertisements associated herewith based on underlying time-coded metadata;

FIG. 3 illustrates a high level intake system for receiving video files and generating underlying time-coded metadata;

FIG. 4 illustrates a more detailed flow chart describing the extraction of metadata from a video signal.

FIGS. 5A and 5B illustrate one exemplary meta data file generated and used within the present invention.

DETAILED DESCRIPTION

The present invention is more particularly described in the following examples that are intended as illustrative only since numerous modifications and variations therein will be apparent to those skilled in the art. Various embodiments of the invention are now described in detail. Referring to the drawings, like numbers indicate like components throughout the views. As used in the description herein and throughout the claims that follow, the meaning of “a”, “an”, and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

The terms used in this specification generally have their ordinary meanings in the art, within the context of the invention, and in the specific context where each term is used.

Certain terms that are used to describe the invention are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the apparatus and methods of the invention and how to make and use them. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that the same thing can be said in more than one way. Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification, including examples of any terms discussed herein, is illustrative only, and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to various embodiments given in this specification. Furthermore, subtitles may be used to help a reader of the specification to read through the specification, which the usage of subtitles, however, has no influence on the scope of the invention.

As used herein, a video program refers to any multimedia content, such as a movie, a television program, an event, a video, an advertisement, a broadcast, or the like that a user would be interested in viewing online or in recorded format.

Turning now to FIG. 1A, a first preferred embodiment of an advertisement placement system 100A based primarily upon time-coded metadata associated with an underlying video program displayed therewith is illustrated. In this first embodiment, it is contemplated that the video program will be viewed in a Video-on-Demand (VOD) or video streaming context from a video provider 110 and that the underlying metadata file associated with the video program, once created, is maintained by the video provider 110. This system 100A includes one or more video storage databases 115 of the video provider 110 and a video server 113 that provides video programs in VOD or video streaming format over a computer network, such as the Internet for example, to a viewer 150 (or end user).

Before a specific video is provided to the viewer 150, a video file 120 associated with the video program 121 is preferably provided to a metadata generator 130. The video file 120 has or includes a unique file name or other video identifier (designated herein by the variable VID). As will be described in greater detail hereinafter, the metadata generator 130 receives the video file 120 and, using a metadata processor 133, creates or generates a time-coded metadata file 125 associated with the corresponding video file 120 and underlying video program 121. As shown in FIG. 1A, this metadata file 125 is stored in a database 135 of the metadata generator 130 but also provided back to the video provider 110 and associated with the corresponding video file 120 and underlying video program 121 in video storage databases 115.

When a request 140 for VOD or video streaming of the video program 121 associated with the video file 120 is received from a video display device 155 (such as a computer, Internet or interactive TV, or similar video playback or viewing device) of the viewer 150, the video provider 110 begins providing access to the video program 121 in conventional fashion (i.e., this assumes all communication and billing parameters are already or previously satisfied; such communication and billing parameters being beyond the scope of the present invention but within the scope and understanding of those skilled in the art). Simultaneously or substantially simultaneously with the start of the video streaming, the metadata file 125 associated with the video file 120 is provided to an advertisement distributor 160, which uses an advertisement server 163 to process the metadata file 125 to selectively identify one or more appropriate advertisements from its database 165 of potential advertisements that is appropriate to provide in conjunction with the video program 121 and, specifically, with each discrete segment of the video program 121 based on its time-coded metadata. The selected advertisement file(s) 175 are then provided to the video display device 155 of the viewer 150. The metadata file 125 may be provided in whole to the advertisement distributor 160 or it may be parsed and provided in piece meal or “as needed” fashion to the advertisement distributor 160.

Preferably, as shown in FIG. 2 , the video display 200 of the video display device 155 is configured to receive and display various types of advertisements in conjunction with the actual video display. Such advertisements are preferably displayed in manners that do not interrupt or delay viewing of the requested video, as would a conventional commercial shown on broadcast television. For example, such advertisements may be shown as conventional banner ads that appear (i) in an optional vertical side window or panel 205 or (ii) in an optional horizontal window or panel 210 that do not interfere with the main video display area 225. The video display 200 may also include conventional header areas and menu control areas 215, 220. Obviously, the placement and purpose of each of the windows and panels of the video display 200 are within the purview of those skilled in the art. In addition, although the advertisements can be displayed as banner ads, it is also possible and expected that alternative advertisements, such as interstitial ads, bug ads, or hyperlinks that can be opened or accessed by the viewer 150, may be used alternatively or in conjunction with the banner advertisements. Such additional advertisements may be placed within the main video display area 225, such as in the lower portion 230 of the main video display area 225. In addition, it should also be understood that while “advertisements” are being used generally to define the information that may be displayed around and during the video playback, it is also possible and expected that other information associated with the video playback, such as background information, more detailed information about the video program, the actors in the video program, scenes, events or locations that appear in the video program, related videos or information, and the like, can be displayed, advertised, or linked during the playback and tied to the current time-code of the video as it is being viewed. Such advertisements can be text, still graphics, videos, audio, hyperlinks, or the like. In some embodiments, the advertisements merely display information. In other embodiments, the advertisements include a hyperlink that, when activated, pause the primary video and allow the user to view or access the advertisement or other additional information.

Although not shown in FIG. 1A, as the video program 121 is being viewed by the viewer 150, the actual time-code of the video program 121 is provided either from the video provider 110 or, preferably, from the video display device 155 itself. This time-code information associated with the actual viewing of the video program 121 either is provided in real time to the advertisement distributor 160 so that appropriate advertisement files 175 can be provided back to the video display device 155 in real time or, alternatively, is provided in advance to the video display device 155 for caching and later access at the appropriate time, based on the time-code location of the video program 121 as it is being viewed.

In an optional embodiment of that shown in FIG. 1A, user or video display device characteristics 185 are obtainable from the video display device 155 and may be provided to the advertisement distributor 160. Such user or video display device characteristics 185 typically include location, age, gender, interests, Internet websites visited, and other similar demographic data that may be obtained from cookies or similar tracking information. The advertisement distributor 160 utilizes the video display device characteristics 185 to generate advertisement files 175 that are further targeted and customized for the viewer 150. The advertisement files 175 are provided back to the video display device 155 for display during viewing of the video program 121. Such targeted advertisements are still shown and synchronized with the time-coded video program 121; however, the user data 185 enables the advertisement server 163 to select more accurately between one or more potentially valid advertisements that could be associated with the video program for a particular time-coded segment.

Turning now to FIG. 1B, a second preferred embodiment of an advertisement placement system 100B based primarily upon time-coded metadata associated with an underlying video program displayed therewith is illustrated. In this embodiment, it is contemplated that the video program is still viewed in a Video-on-Demand (VOD) or video streaming context from a video provider 110; however, unlike the first embodiment, in this scenario, the underlying metadata file 125 associated with the video program 121 is sent to the video display device 155 along with the video program 121. The video display device 155 (or at least the video player system/software installed on the video display device) then sends the metadata file 125 (or parsed segments thereof at appropriate time intervals) to the advertisement distributor 160 so that appropriate advertisement file(s) 175 are returned back to the player 155. Again, optionally, user data 185 may be provided from the video display device 155 to the advertisement distributor 160 to enable the advertisement server 163 to select more accurately between one or more potentially valid advertisements that could be associated with the video program for a particular time-coded segment.

Turning now to FIG. 1C, a third preferred embodiment of an advertisement placement system 100C based primarily upon time-coded metadata associated with an underlying video program displayed therewith is illustrated. In this embodiment, it is contemplated that the video program is also viewed in a Video-on-Demand (VOD) or video streaming context from a video provider 110; however, unlike the first and second embodiments, in this scenario, the underlying metadata file 125 associated with the video program 121, once created, is not provided back to the video provider 110. This embodiment is similar in most respects to the first embodiment; however, when a request for video 140 is received from the viewer 150, the video provider 110 sends a request 145 for metadata file 125 associated with the video program 121. This request 145 is either sent directly to the metadata generator 130 (as shown) or (as not shown) to the advertisement distributor 160 first, which then requests the same from the metadata generator 130. The metadata generator 130 then retrieves the appropriate time-coded metadata file 125 from its database 135 and provides it to the advertisement distributor 160. The remaining aspects, variations, and alternatives of this embodiment are similar to those discussed in association with the first embodiment.

Turning now to FIG. 1D, a fourth embodiment is illustrated, which is another variation of the embodiment shown in FIG. 1C. Again, the metadata file 125 is maintained by the metadata generator 130, but upon receipt of a request 145, this time from the video display device 155, the metadata file 125 is provided to the video display device 155 and provided, preferably in parsed or “as needed” basis to the advertisement distributor 160. In another slight alternative arrangement, in response to the request 145 (or series of requests containing the video ID and time code location) from the video display device 155, the metadata generator 130 may provide the “as needed” portion of the metadata file 125 to the advertisement distributor 160 corresponding to the video segment being viewed by the viewer 150.

Turning now to FIG. 1E, a fifth preferred embodiment of an advertisement placement system 100E based primarily upon time-coded metadata associated with an underlying video program displayed therewith is illustrated. In this embodiment, it is contemplated that the video program is actually provided or sold to the viewer 150 on a DVD 117 or similar storage medium, or is provided as a file download (not shown) (as opposed to a mere video streaming in which the file is not actually downloaded) for later playback. This embodiment is similar to the first embodiment to the extent that the underlying metadata file associated with the video program, once created, is maintained by the video provider 110. This system 100E includes one or more video storage databases 115 of the video provider 110 and a video manager 116 that communicates with the metadata generator 130, the advertisement distributor 160, and the video storage databases 115, and which manages the production of stored video programs 117 for distribution in DVD format or for download or the like.

Similar to the first embodiment, before a stored video program 117 is created and made available to an viewer 150, a video file 120 associated with the stored video program 117 is preferably provided to the metadata generator 130. The video file 120 has or includes a unique file name or other video identifier (designated herein by the variable VID). As will be described in greater detail hereinafter, the metadata generator 130 receives the video file 120 and, using a metadata processor 133, creates or generates a time-coded metadata file 125 associated with the corresponding video file 120 and underlying video program 121. This metadata file 125 is stored in a database 135 of the metadata generator 130 but is also provided back to the video provider 110 and associated with the corresponding video file 120 in video storage databases 115.

As part of the process for creating a stored video program 117, the metadata file 125 associated with the video file 120 is provided to the advertisement distributor 160, which uses an advertisement server 163 to process the metadata file 125 to selectively identify one or more appropriate advertisements from its database 165 of potential advertisements that is appropriate to provide in conjunction with the stored video program 117 and, specifically, with each discrete segment of the stored video program 117 based on its time-coded metadata. The selected advertisement file(s) 175 are then provided back to the video provider 110, which incorporates the advertisement files 175 directly on the stored video program 117 along with the actual video file 120. In this manner, the stored video program 117 has all necessary and desired advertisement files 175 built into the stored video program 117 and plays advertisements during viewing of the video in situations in which the video display device 155 does not (intentional, unintentional, non-compatible, or for whatever reason) have real time access to the Internet to obtain real-time advertisements associated with the video. The remaining aspects, variations, and alternatives of this embodiment are similar to those discussed in association with the first embodiment.

Turning now to FIG. 1F, a sixth preferred embodiment of an advertisement placement system 100F based primarily upon time-coded metadata associated with an underlying video program displayed therewith is illustrated. This embodiment is similar to the fifth embodiment; however, it is contemplated that the video display device 155 has access to the Internet and, thus, is able to obtain real-time advertisement files 175 from advertisement distributor 160. This arrangement is preferred to the fifth embodiment since advertisements associated with the video program are not fixed and unchangeable on the stored video program 117 media. Instead, over time and with each viewing of the stored video program 117, the viewer 150, potentially, has a new advertisement experience.

For this reason, it is desirable to have the time-coded metadata file 125 actually stored on the stored video program 117 along with the video file 121 so that when the video program is actually being viewed by the viewer 150 on the video display device 155, the video display device 155 initiates a communication with the advertisement distributor 160 to provide the time-coded metadata file 125 and to receive back appropriate advertisement file(s) 175. Again, in an alternative arrangement, it may be desirable for the viewer 150 to provide or for the advertisement distributor 160 to have user or video display device characteristics 185 (as described in greater detail previously) so that the advertisement files 175 associated with the time-coded metadata of the stored video program 117 are tailored and targeted slightly more at the viewer 150, but still associated with the appropriate segment of the video program.

In an additional, alternative embodiment (not shown), the embodiments shown in FIGS. 1E and 1F are combined to store a base set of advertisement files 175 on the stored video program 117. The advertisement files 175 are for situations in which the video display device 155 is “offline” and does not have access to the Internet. The video display device 155 (or the software associated with the stored video program 117) is configured to interact in real time with the advertisement distributor 160 to obtain current and up-to-date advertisement files 175 when it is actually able to access the Internet and communicate with the advertisement distributor 160. In such a scenario, the more up-to-date advertisement files 175 are shown during the video playback if they are available. If they are not available, the pre-stored, base advertisement files 175 are used.

It should also be understood that there are many other alternative arrangements and variations of how and where various files are stored and provided. The embodiments shown in FIGS. 1A through 1F represent just some of the more likely arrangements and components involved. Additionally, there may be multiple additional parties involved such that the roles and responsibilities for providing and receiving files, for processing files, and for exchanging and storing data can be handled by different parties or components. For example, there may be two separate parties or components used to generate time-coded metadata files and to store and provide such time-coded metadata files to third parties upon request. Likewise, the video provider may want to act as the throughput for the advertisement distributor so that the video display device never interacts directly with a specified advertisement distributor. This can be controlled more easily in the video streaming context, since the links to the advertisement distributor can be dynamically changed over time to point to the preferred or desired advertisement distributor associated with the video provider. For the stored video program embodiments, it may be desirable to have advertisement links that go back through the video provider—this would enable the video provider to update and change the advertisement distributor used over longer periods of time and prevent such links, hard-coded onto the stored video program from becoming obsolete or broken.

FIGS. 3 through 5B provide more detailed explanations are described for the creation of time-coded metadata files associated with underlying video programs. Turning first to FIG. 3 , a high level view 300 of the intake process for creating a time-coded metadata file 125 is described. The metadata generator 130 receives the video file 120, which has or includes a unique file name or other video identifier (designated herein by the variable VID). The video file 120 and identifier are stored initially in a SOAP database 310. Preferably, the video file 120 is received in .mp4 (MPEG 4) format or, if not, is converted to such (or similar) format, as may be changed or updated from time to time. A hash of this file is run to generate a unique “video signature” and is checked against the existing video signatures stored in the database 135 to determine if a time-coded metadata file already exists for the video file 120 received. If so, the appropriate time-coded metadata file 125 is provided to any requesting party. If the file does not exist, the video file in .mp4 format is then provided to an audio processor 320 and to a video processor 330. Once the audio and video have been parsed and used to identify underlying metadata of the video program, the time-coded metadata file is stored in database 135 and is available for distribution or use, as described in association with the embodiments of FIGS. 1A through 1F.

FIG. 4 illustrates, in more detail, the steps 400 performed by the metadata processor 133 when a video file 120 is received for intake and processing. First, the video identifier is obtained from the video provider (step 410). As stated previously, this identifier may simply be the file name for the video file or it may be the title and year of the video file or something similar. Based on this identifier, it is possible to determine whether this particular video had been previously processed (step 412). If the video had already been processed previously, the database storing such time-coded metadata file is updated and cross-referenced with the identifier and existing video signature (step 430), then the process jumps to step 480 and determines whether another video needs to be processed or not. If the video had not been processed previously, at least based on its identifier, the system then downloads or receives the full video file from the video provider for further processing (step 415). As part of step 415, the video file is converted to .mp4 (MPEG 4) format, if it is not already in such format, and a hash or “video signature” of the .mp4 version of the video is created. Based on this video signature, it is possible to determine whether this particular video had been previously processed, even if the video identifier did not match a previously-known identifier (step 420). If the video had already been processed previously, the database storing such time-coded metadata files is updated and cross-referenced with the additional identifier and video signature, if necessary (step 430), then the process jumps to step 480 and determines whether another video needs to be processed or not. If the video had not been processed previously, it is then submitted to an audio processor (step 440) for audio capture and separation and speech recognition (among other things) and a video processor (step 450) for classification and analysis (among other things). The resulting metadata is compiled, tied to or associated with the underlying timecode of the video program, and stored in the metadata database (step 470). The process then determines whether there is another video to be processed (step 480). If so, the process 400 starts over. If not, the process 400 ends.

Generally, when a video program is received or converted to .mp4 format, an underlying time-code exists or is established for the video program. All audio and video metadata identified or extracted from the video program by the metadata processor 133 is then tied or associated with specific points or regions within the time code. Initially, key identifiers for the video program are determined and identified. This includes all characters who appear in the video program, key or reoccurring scene locations, key props and objects, key terms, etc. The key identifiers are typically audio features and/or video features, and are extracted from the video signal. Then, the video portion of the video program is parsed and divided into “short clips” or discrete segments. Such segments can be specified by a predetermined time frame, but can alternatively be identified based on information within the video signal, such as, for example, a change of camera shots, angles, scene change, scene break or the like. It should also be noted that different video segments can be defined by different predetermined time frames.

Once the video signal is divided, then each segmented video clip is then digitized. The breaks between each segment is identified and tied to the time-code timeline. Next, the metadata processor 133 runs a language and speech recognition process through the entire video and associates all of the dialogue and background audio with the appropriate video segments and time-codes. Next, characters within the video signal are associated with each of the dialogue entries. Finally, the metadata processor 133 runs a number of visual processing programs to identify characters, objects, scenes within each segment of the video program. Each identified audio feature is thus associated with at least one segmented video clip. Similarly, each identified video feature is also associated with at least one segmented video clip.

The associated metadata file enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip. Those of skill in the art will readily appreciate that presentment is typically implemented by a visual display device, but may also include email, file delivery, and other delivery methods.

The video features identified by the visual processor include at least one of (i) people, (ii) characters, (iii) animals, (iv) objects, (v) geographic locations, (vi) background, (vii) scenes, or a combination of any of these features. Preferably, these video features are extracted by a visual processing system of the feature extraction system. In one embodiment, the method includes the step of identifying and recognizing one or more objects from the video signal by an object classification system of the feature extraction system. In another embodiment, the method includes the step of identifying and recognizing one or more scenes from the video signal by a scene classification system of the feature extraction system. In yet another embodiment, the method includes a combination of both steps.

Audio features of the audio signal includes at least one of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and a combination of these audio features. The method further includes the steps of: (i) identifying and recognizing one or more background sounds from the audio signal by using a sound classification system of the feature extraction system, (ii) identify and recognizing one or more music segments from the audio signal by using a music classification system of the feature extraction system, and (iii) identifying and recognizing human speech, dialogues, one or more words, one or more phrases by using a speech recognition system of the feature extraction system. The method further includes the steps of: (i) collecting audio features of the audio signal by using audio signal recognition system of the feature extraction system, and (ii) saving the collected audio features in the metadata file.

Preferably, the metadata file is in XML format. An exemplary portion of a time-coded metadata file, in XML format, is illustrated in FIGS. 5A and 5B. The metadata file contains one or more of (i) video identification information, (ii) a file name, (iii) a digital signature, (iv) the length of the video signal, (v) a keyword list, (vi) a time-coded transcript, (vii) one or more segments with a corresponding start and stop time, (viii) one or more contents, (ix) one or more characters, (x) one or more animals, (xi) one or more objects, and (xii) a list of vocabulary.

The foregoing description of the exemplary embodiments of the invention has been presented only for the purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching.

The embodiments were chosen and described in order to explain the principles of the invention and their practical application so as to enable others skilled in the art to utilize the invention and various embodiments and with various modifications as are suited to the particular use contemplated. Alternative embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its spirit and scope. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description and the exemplary embodiments described therein. 

1-24. (canceled)
 25. A method comprising: extracting, by a server, audio and visual features from each segment of a plurality of segments of a content item; cross-referencing, by the server, a database with an identifier for the content item; updating, at the server, time-based metadata associated with the content item in the database with the audio and visual features of each segment of the plurality of segments of the content item; transmitting, by the server, the content item to a display device; transmitting, by the server, a portion of time-based metadata corresponding to a first segment of the plurality of segments to an advertising distributor that selects a relevant advertisement based at least in part on the portion of time-based metadata corresponding to the first segment; receiving, by the server, the relevant advertisement from the advertising distributor; and transmitting, by the server, the relevant advertisement to the display device and the time-based metadata.
 26. The method of claim 25 wherein transmitting, by the server, the portion of time-based metadata corresponding to the first segment of the plurality of segments to an advertising distributor comprises: determining an object in the first segment; transmitting, by the server, metadata associated with the object.
 27. The method of claim 25 further comprising determining, by the server, a location in the content item to insert an advertisement.
 28. The method of claim 27 further comprising: determining the portion of the time-based metadata to transmit to an advertising distributor based on the determined location in the content item to insert the advertisement.
 29. The method of claim 25 wherein extracting, by a server, audio and visual features from each segment of the plurality of the segments of the content item comprises: identifying, by the server, extracted audio features from the content item using audio processing, wherein the extracted audio features comprise time-code locations of each segment of the plurality of segments within the content item.
 30. The method of claim 29, wherein the extracted audio features include one or more of discrete sounds and background noise.
 31. The method of claim 25 wherein extracting, by a server, audio and visual features from each segment of the plurality of the segments of the content item comprises: identifying, by the server, extracted video features from the content item using video processing, wherein the extracted video features comprise time-code locations of each segment of the plurality of segments within the content item.
 32. The method of claim 31, wherein the extracted video features include one or more of actors, characters, animals, objects, geographic locations, background, setting, theme, events, and scenes.
 33. The method of claim 25, wherein cross-referencing the database with an identifier for the content item comprises: determining that the content item has been processed previously; in response to determining that the content item has been processed previously, cross-referencing the identifier for the content item and an existing video signature.
 34. The method of claim 25 wherein the relevant advertisement is selected, by the advertising distributor, at least in part based on audio and visual features of a segment of the plurality of the segments.
 35. A system comprising: a video provider configured to: extract audio and visual features from each segment of a plurality of segments of a content item; cross-reference a database with an identifier for the content item; update time-based metadata associated with the content item in the database with the audio and visual features of each segment of the plurality of segments; input/output circuitry configured to: transmit the content item to a display device; transmit a portion of the time-based metadata corresponding to a first segment of the plurality of segments to an advertising distributor that selects a relevant advertisement based at least in part on the portion of time-based metadata corresponding to the first segment; receive the relevant advertisement from the advertising distributor; and transmit the relevant advertisement to the display device and the time-based metadata.
 36. The system of claim 35 wherein the input/output circuitry, when transmitting a portion of the time-based metadata corresponding to the first segment of the plurality of segments to an advertising distributor, is further configured to: determine an object in the first segment; transmit, by the server, metadata associated with the object.
 37. The system of claim 35, wherein the video provider is further configured to determine a location in the content item to insert an advertisement.
 38. The system of claim 37, wherein the video provider is further configured to determine the portion of the time-based metadata to transmit to an advertising distributor based on the determined location in the content item to insert the advertisement.
 39. The system of claim 35, wherein the video provider is further configured to, when extracting audio and visual features from each segment of the plurality of the segments of the content item: identify extracted audio features from the content item using audio processing, wherein the extracted audio features comprise time-code locations of each segment of the plurality of segments within the content item.
 40. The system of claim 39, wherein the video provider, when identifying extracted audio features from the content item using audio processing, is further configured to identify one or more of discrete sounds and background noise.
 41. The system of claim 35, wherein the video provider is further configured to, when extracting audio and visual features from each segment of the plurality of segments of the content item: identify extracted video features from the content item using video processing, wherein the extracted video features comprise time-code locations of each segment of the plurality of segments within the content item.
 42. The system of claim 41, wherein the video provider, when identifying extracted visual features from the content item using video processing, is further configured to identify one or more of actors, characters, animals, objects, geographic locations, background, setting, theme, events, and scenes.
 43. The system of claim 35 wherein the video provider, when cross-referencing the database with an identifier for the content item, is further configured to: determine that the content item has been processed previously; in response to determining that the content item has been processed previously, cross-reference the identifier for the content item and an existing video signature.
 44. The system of claim 35 wherein the advertising distributor is further configured to select the relevant advertisement based at least in part based on audio and visual features of a segment of the plurality of the segments. 